Search CORE

9 research outputs found

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Author: Cheung Alvin
Kemper Alfons
Palkar Shoumik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/06/2018
Field of study

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

arXiv.org e-Print Archive

Crossref

E2: a framework for NFV applications

Author: Han Sangjin
Jang Keon
Lan Chang
Palkar Shoumik
Panda Aurojit
Ratnasamy Sylvia
RIZZO LUIGI
Shenker Scott
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

By moving network appliance functionality from proprietary hardware to software, Network Function Virtualization promises to bring the advantages of cloud computing to network packet processing. However, the evolution of cloud computing (particularly for data analytics) has greatly bene- fited from application-independent methods for scaling and placement that achieve high efficiency while relieving programmers of these burdens. NFV has no such general management solutions. In this paper, we present a scalable and application-agnostic scheduling framework for packet processing, and compare its performance to current approaches

CiteSeerX

Archivio della Ricerca - Università di Pisa

E2: A Framework for NFV Applications

Author: Chang Lan
Keon Jang
Luigi Rizzo
Sangjin Han
Scott Shenker
Shoumik Palkar
Sylvia Ratnasamy
Publication venue
Publication date: 23/04/2020
Field of study

Abstract By moving network appliance functionality from proprietary hardware to software, Network Function Virtualization promises to bring the advantages of cloud computing to network packet processing. However, the evolution of cloud computing (particularly for data analytics) has greatly benefited from application-independent methods for scaling and placement that achieve high efficiency while relieving programmers of these burdens. NFV has no such general management solutions. In this paper, we present a scalable and application-agnostic scheduling framework for packet processing, and compare its performance to current approaches

CiteSeerX

Weld: A Common Runtime for High Performance Data Analytics

Author: Amarasinghe Saman P
Narayanan Deepak
Palkar Shoumik
Pirk Holger
Schwarzkopf Malte
Shanbhag Anil Atmanand
Thomas James J.
Zaharia Matei
Publication venue
Publication date: 23/11/2020
Field of study

© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved. Modern analytics applications combine multiple functions from different libraries and frameworks to build increasingly complex workflows. Even though each function may achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. To address this problem, we propose Weld, a runtime for data-intensive applications that optimizes across disjoint libraries and functions. Weld uses a common intermediate representation to capture the structure of diverse data-parallel workloads, including SQL, machine learning and graph analytics. It then performs key data movement optimizations and generates efficient parallel code for the whole workflow. Weld can be integrated incrementally into existing frameworks like TensorFlow, Apache Spark, NumPy and Pandas without changing their user-facing APIs. We show that Weld can speed up these frameworks, as well as applications that combine them, by up to 30×

DSpace@MIT